Optimized Feature Extraction and HMMs in Subword Detectors
نویسندگان
چکیده
This paper presents methods and results for optimizing subword detectors in continuous speech. Speech detectors are useful within areas like detection-based ASR, pronunciation training, phonetic analysis, word spotting, etc. We build detectors for both articulatory features and phones by discriminative training of detector-specific MFCC filterbanks and HMMs. The resulting filterbanks are clearly different from each other and reflect acoustic properties of the corresponding detection classes. For the TIMIT task, our detector-specific features reduce the average detection error rate by 20% compared to standard MFCCs.
منابع مشابه
Design of Detectors for Automatic Speech Recognition
This thesis presents methods and results for optimizing subword detectors in continuous speech. Speech detectors are useful within areas like detectionbased ASR, pronunciation training, phonetic analysis, word spotting, etc. Firstly, we propose a structure suitable for subword detection. This structure is based on the standard HMM framework, but in each detector the MFCC feature extractor and t...
متن کاملNon - Speech Acoustic Event Detection Using
Non-speech acoustic event detection (AED) aims to recognize events that are relevant to human activities associated with audio information. Much previous research has been focused on restricted highlight events, and highly relied on ad-hoc detectors for these events. This thesis focuses on using multimodal data in order to make non-speech acoustic event detection and classification tasks more r...
متن کاملEstimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis
In standard approaches to hidden Markov model (HMM)-based speech synthesis, window coefficients for calculating dynamic features are pre-determined and fixed. This may not be optimal to capture various context-dependent dynamic characteristics in speech signals. This paper proposes a data-driven technique to estimate the window coefficients. They are optimized so as to maximize the likelihood o...
متن کاملFeature extraction of hyperspectral images using boundary semi-labeled samples and hybrid criterion
Feature extraction is a very important preprocessing step for classification of hyperspectral images. The linear discriminant analysis (LDA) method fails to work in small sample size situations. Moreover, LDA has poor efficiency for non-Gaussian data. LDA is optimized by a global criterion. Thus, it is not sufficiently flexible to cope with the multi-modal distributed data. We propose a new fea...
متن کاملPOST: parallel object-oriented speech toolkit
We give a short overview of POST, a parallel speech toolkit that is distributed freeware to academic institutions. The underlying idea of POST is that large computational problems, like the ones involved in Automatic Speech Recognition (ASR), can be solved more cost effectively by using the aggregate power and memory of many computers. In its current version (January 96) and amongst other thing...
متن کامل